skellam (Skellam distribution)#

The Skellam distribution models the difference of two independent Poisson counts: (X = Y_1 - Y_2), where (Y_1 \sim \text{Pois}(\mu_1)) and (Y_2 \sim \text{Pois}(\mu_2)).

It is a natural model for net counts: wins minus losses, arrivals minus departures, goals for minus goals against, etc.

This notebook uses the same parameterization as scipy.stats.skellam:

  • mu1 = mean/rate of the first Poisson process ((\mu_1 \ge 0))

  • mu2 = mean/rate of the second Poisson process ((\mu_2 \ge 0))

Learning goals#

By the end you should be able to:

  • recognize problems where a Skellam model makes sense (and when it doesn’t)

  • write down the PMF/CDF and key properties

  • derive the mean, variance, and log-likelihood

  • sample from the distribution using NumPy only

  • visualize PMF/CDF and validate formulas with Monte Carlo simulation

  • use scipy.stats.skellam for computation and fitting

Table of contents#

  1. Title & Classification

  2. Intuition & Motivation

  3. Formal Definition

  4. Moments & Properties

  5. Parameter Interpretation

  6. Derivations

  7. Sampling & Simulation

  8. Visualization

  9. SciPy Integration

  10. Statistical Use Cases

  11. Pitfalls

  12. Summary

import math

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

from scipy import optimize, special, stats

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(42)

1) Title & Classification#

Name: skellam (Skellam distribution)
Type: Discrete
Support: (k \in \mathbb{Z})
Parameters: (\mu_1, \mu_2)
Parameter space: (\mu_1 \ge 0,\ \mu_2 \ge 0) (not both zero for a non-degenerate distribution)

In SciPy, skellam also accepts a loc shift. In this notebook we focus on the standard, unshifted case (loc = 0).

2) Intuition & Motivation#

Think of two independent counting processes over the same time window:

  • (Y_1): “positive” events (arrivals, goals for, wins, upward jumps)

  • (Y_2): “negative” events (departures, goals against, losses, downward jumps)

The Skellam variable (X = Y_1 - Y_2) is the net balance.

Typical real-world use cases#

  • Sports score differential: if each team’s goals are modeled as independent Poisson counts, the goal difference is Skellam.

  • A/B event streams: net count of two independent event types in logs (e.g., adds − deletes).

  • Queueing / inventory: arrivals minus services (over short windows where rates are approximately constant).

  • Change scores: differences of counts in two conditions or two periods (when each is Poisson-like).

Relations to other distributions#

  • Poisson: if (\mu_2 = 0), then (X \sim \text{Pois}(\mu_1)); if (\mu_1 = 0), then (-X \sim \text{Pois}(\mu_2)).

  • Normal approximation: for large (\mu_1 + \mu_2), (X) is approximately (\mathcal{N}(\mu_1-\mu_2,\ \mu_1+\mu_2)).

  • Poisson–Binomial connection: if (N = Y_1 + Y_2), then (N \sim \text{Pois}(\mu_1+\mu_2)) and (Y_1 \mid N \sim \text{Bin}(N,\ \mu_1/(\mu_1+\mu_2))). Since (X = 2Y_1 - N), this gives an alternative view of Skellam.

  • Additivity: differences of independent Poissons add: if (X_i = Y_{1,i}-Y_{2,i}) with independent Poisson components, then (\sum_i X_i) is Skellam with summed rates.

3) Formal Definition#

Let (Y_1 \sim \text{Pois}(\mu_1)) and (Y_2 \sim \text{Pois}(\mu_2)) be independent. Define [ X = Y_1 - Y_2. ]

PMF#

For (k \in \mathbb{Z}), the probability mass function is [ \mathbb{P}(X = k) = e^{-(\mu_1+\mu_2)}\left(\frac{\mu_1}{\mu_2}\right)^{k/2} I_{|k|}!\left(2\sqrt{\mu_1\mu_2}\right), ] where (I_{\nu}(\cdot)) is the modified Bessel function of the first kind.

Boundary cases are intuitive:

  • if (\mu_2=0), then (X) is Poisson on ({0,1,2,\dots})

  • if (\mu_1=0), then (X) is the negative of a Poisson on ({0,-1,-2,\dots})

  • if (\mu_1=\mu_2=0), then (X=0) almost surely

CDF#

The cumulative distribution function is [ F(k) = \mathbb{P}(X \le k) = \sum_{j=-\infty}^{k} \mathbb{P}(X=j). ] There is no simple closed form in general; in practice it is computed numerically (e.g., by SciPy).

4) Moments & Properties#

Let (X \sim \text{Skellam}(\mu_1, \mu_2)).

Mean and variance#

[ \mathbb{E}[X] = \mu_1 - \mu_2,\qquad \mathrm{Var}(X) = \mu_1 + \mu_2. ]

Skewness and kurtosis#

Let (\sigma^2 = \mu_1 + \mu_2) (assuming (\sigma^2>0)). Then [ \text{skew}(X) = \frac{\mu_1 - \mu_2}{(\mu_1+\mu_2)^{3/2}}. ] The excess kurtosis is [ \text{excess kurt}(X) = \frac{1}{\mu_1+\mu_2}. ] (so the full kurtosis is (3 + 1/(\mu_1+\mu_2))).

MGF, characteristic function, and cumulants#

The moment generating function exists for all real (t) and is [ M_X(t) = \mathbb{E}[e^{tX}] = \exp\big(\mu_1(e^t-1) + \mu_2(e^{-t}-1)\big). ] The characteristic function is [ \varphi_X(t) = \mathbb{E}[e^{itX}] = \exp\big(\mu_1(e^{it}-1) + \mu_2(e^{-it}-1)\big). ] The cumulant generating function (K(t)=\log M_X(t)) is [ K(t) = \mu_1(e^t-1) + \mu_2(e^{-t}-1). ] Differentiating shows a neat pattern: odd-order cumulants equal (\mu_1-\mu_2), even-order cumulants equal (\mu_1+\mu_2).

Entropy#

The (Shannon) entropy is [ H(X) = -\sum_{k\in\mathbb{Z}} \mathbb{P}(X=k),\log \mathbb{P}(X=k). ] There is no simple closed form in general, but it can be approximated accurately by truncating the tails.

Other useful properties#

  • Symmetry: if (\mu_1=\mu_2), the distribution is symmetric around 0.

  • Additivity: if (X_1\sim\text{Skellam}(\mu_{1a},\mu_{2a})) and (X_2\sim\text{Skellam}(\mu_{1b},\mu_{2b})) independent, then (X_1+X_2\sim\text{Skellam}(\mu_{1a}+\mu_{1b},\ \mu_{2a}+\mu_{2b})).

  • Normal approximation: when (\mu_1+\mu_2) is large, a Normal approximation is often excellent.

def _validate_mu(mu, name):
    mu_float = float(mu)
    if mu_float < 0.0:
        raise ValueError(f"{name} must be >= 0")
    return mu_float


def _validate_mu1_mu2(mu1, mu2):
    mu1 = _validate_mu(mu1, "mu1")
    mu2 = _validate_mu(mu2, "mu2")
    return mu1, mu2


def skellam_logpmf(k, mu1, mu2):
    """Log PMF for Skellam(mu1, mu2) for integer k. Returns -inf outside support."""
    mu1, mu2 = _validate_mu1_mu2(mu1, mu2)
    k_arr = np.asarray(k)

    out = np.full(k_arr.shape, -np.inf, dtype=float)

    k_int = k_arr.astype(int)
    is_int = k_int == k_arr
    if not np.any(is_int):
        return out

    # Degenerate and boundary cases
    if mu1 == 0.0 and mu2 == 0.0:
        out[is_int & (k_int == 0)] = 0.0
        return out

    if mu1 == 0.0:
        mask = is_int & (k_int <= 0)
        out[mask] = stats.poisson.logpmf(-k_int[mask], mu2)
        return out

    if mu2 == 0.0:
        mask = is_int & (k_int >= 0)
        out[mask] = stats.poisson.logpmf(k_int[mask], mu1)
        return out

    kv = k_int[is_int]
    v = np.abs(kv)
    z = 2.0 * math.sqrt(mu1 * mu2)

    # Use the exponentially scaled Bessel function for numerical stability:
    # ive(v, z) = exp(-|z|) * I_v(z)  => log I_v(z) = log ive(v, z) + |z|
    log_iv = np.log(special.ive(v, z)) + abs(z)

    log_ratio = math.log(mu1) - math.log(mu2)
    out[is_int] = -(mu1 + mu2) + 0.5 * kv * log_ratio + log_iv
    return out


def skellam_pmf(k, mu1, mu2):
    return np.exp(skellam_logpmf(k, mu1, mu2))


def skellam_moments(mu1, mu2):
    mu1, mu2 = _validate_mu1_mu2(mu1, mu2)
    mean = mu1 - mu2
    var = mu1 + mu2

    if var == 0.0:
        return {
            "mean": 0.0,
            "var": 0.0,
            "skew": float("nan"),
            "excess_kurt": float("nan"),
            "kurt": float("nan"),
        }

    skew = (mu1 - mu2) / (var**1.5)
    excess_kurt = 1.0 / var
    return {
        "mean": mean,
        "var": var,
        "skew": skew,
        "excess_kurt": excess_kurt,
        "kurt": 3.0 + excess_kurt,
    }


def skellam_mgf(t, mu1, mu2):
    mu1, mu2 = _validate_mu1_mu2(mu1, mu2)
    t = np.asarray(t, dtype=float)
    return np.exp(mu1 * (np.exp(t) - 1.0) + mu2 * (np.exp(-t) - 1.0))


def skellam_cf(t, mu1, mu2):
    mu1, mu2 = _validate_mu1_mu2(mu1, mu2)
    t = np.asarray(t, dtype=float)
    return np.exp(mu1 * (np.exp(1j * t) - 1.0) + mu2 * (np.exp(-1j * t) - 1.0))


def sample_skewness(x):
    x = np.asarray(x, dtype=float)
    m = x.mean()
    c = x - m
    m2 = np.mean(c**2)
    if m2 == 0.0:
        return float("nan")
    m3 = np.mean(c**3)
    return float(m3 / (m2**1.5))


def sample_excess_kurtosis(x):
    x = np.asarray(x, dtype=float)
    m = x.mean()
    c = x - m
    m2 = np.mean(c**2)
    if m2 == 0.0:
        return float("nan")
    m4 = np.mean(c**4)
    return float(m4 / (m2**2) - 3.0)


def skellam_entropy(mu1, mu2, *, tail_prob=1e-12, base=math.e):
    """Approximate entropy by truncating tails with total probability `tail_prob`."""
    mu1, mu2 = _validate_mu1_mu2(mu1, mu2)
    if mu1 == 0.0 and mu2 == 0.0:
        return 0.0

    tail_prob = float(tail_prob)
    if not (0.0 < tail_prob < 1.0):
        raise ValueError("tail_prob must be in (0, 1)")

    lo = stats.skellam.ppf(tail_prob / 2.0, mu1, mu2)
    hi = stats.skellam.ppf(1.0 - tail_prob / 2.0, mu1, mu2)

    if not (np.isfinite(lo) and np.isfinite(hi)):
        mean = mu1 - mu2
        sd = math.sqrt(mu1 + mu2)
        lo = math.floor(mean - 20.0 * sd)
        hi = math.ceil(mean + 20.0 * sd)

    lo = int(math.floor(lo))
    hi = int(math.ceil(hi))

    ks = np.arange(lo, hi + 1)
    logp = stats.skellam.logpmf(ks, mu1, mu2)
    p = np.exp(logp)

    h_nats = -np.sum(p * logp)
    return float(h_nats / math.log(base))


def skellam_mom_estimates(x):
    """Method-of-moments estimates from sample mean and variance."""
    x = np.asarray(x)
    m = float(x.mean())
    v = float(x.var(ddof=0))

    mu1_hat = 0.5 * (v + m)
    mu2_hat = 0.5 * (v - m)

    return max(mu1_hat, 0.0), max(mu2_hat, 0.0)
mu1, mu2 = 8.0, 5.0
mom = skellam_moments(mu1, mu2)

samples = rng.poisson(mu1, size=300_000) - rng.poisson(mu2, size=300_000)

{
    "formula_mean": mom["mean"],
    "mc_mean": float(samples.mean()),
    "formula_var": mom["var"],
    "mc_var": float(samples.var(ddof=0)),
    "formula_skew": mom["skew"],
    "mc_skew": sample_skewness(samples),
    "formula_excess_kurt": mom["excess_kurt"],
    "mc_excess_kurt": sample_excess_kurtosis(samples),
    "entropy_bits (approx)": skellam_entropy(mu1, mu2, base=2),
}
{'formula_mean': 3.0,
 'mc_mean': 3.001663333333333,
 'formula_var': 13.0,
 'mc_var': 12.952040566655556,
 'formula_skew': 0.06400386879521874,
 'mc_skew': 0.06958811620543011,
 'formula_excess_kurt': 0.07692307692307693,
 'mc_excess_kurt': 0.06572870545169307,
 'entropy_bits (approx)': 3.896700141704304}

5) Parameter Interpretation#

You can think of (\mu_1) as the expected number of positive events and (\mu_2) as the expected number of negative events.

  • Location / drift: (\mathbb{E}[X] = \mu_1 - \mu_2). Increasing (\mu_1) shifts mass to the right; increasing (\mu_2) shifts mass to the left.

  • Dispersion: (\mathrm{Var}(X) = \mu_1 + \mu_2). Increasing both parameters makes the distribution wider.

  • Symmetry: if (\mu_1=\mu_2), the distribution is symmetric around 0.

  • Asymptotics: for large (\mu_1+\mu_2), the PMF looks increasingly bell-shaped (Normal approximation).

The plot below compares several parameter settings.

param_sets = [
    (5.0, 5.0),
    (10.0, 5.0),
    (5.0, 10.0),
    (20.0, 20.0),
]

alpha = 0.001
lo = int(min(stats.skellam.ppf(alpha, m1, m2) for m1, m2 in param_sets))
hi = int(max(stats.skellam.ppf(1 - alpha, m1, m2) for m1, m2 in param_sets))
ks = np.arange(lo, hi + 1)

fig = go.Figure()
for m1, m2 in param_sets:
    pmf = stats.skellam.pmf(ks, m1, m2)
    fig.add_trace(
        go.Scatter(
            x=ks,
            y=pmf,
            mode="lines+markers",
            name=f"mu1={m1:g}, mu2={m2:g}",
        )
    )

fig.update_layout(
    title="Skellam PMF for different (mu1, mu2)",
    xaxis_title="k",
    yaxis_title="P(X=k)",
)
fig.show()

6) Derivations#

A) Expectation#

Using linearity of expectation and the fact that a Poisson((\mu)) variable has mean (\mu): [ \mathbb{E}[X] = \mathbb{E}[Y_1 - Y_2] = \mathbb{E}[Y_1] - \mathbb{E}[Y_2] = \mu_1 - \mu_2. ]

B) Variance#

Independence gives (\mathrm{Cov}(Y_1, Y_2)=0). Poisson variables have variance equal to their mean, so [ \mathrm{Var}(X) = \mathrm{Var}(Y_1 - Y_2) = \mathrm{Var}(Y_1) + \mathrm{Var}(Y_2) = \mu_1 + \mu_2. ]

(You can also get both results by differentiating the cumulant generating function (K(t)=\mu_1(e^t-1)+\mu_2(e^{-t}-1)) at (t=0).)

C) Likelihood#

For observations (x_1,\dots,x_n\in\mathbb{Z}), the likelihood is [ L(\mu_1,\mu_2) = \prod_{i=1}^{n} \mathbb{P}(X=x_i\mid \mu_1,\mu_2), ] and the log-likelihood (for (\mu_1,\mu_2>0)) is [ \ell(\mu_1,\mu_2) = \sum_{i=1}^{n}\Big[-(\mu_1+\mu_2) + \tfrac{x_i}{2}\log(\mu_1/\mu_2)

  • \log I_{|x_i|}(2\sqrt{\mu_1\mu_2})\Big]. ] There is no general closed-form MLE; we typically use numerical optimization, often initialized with method-of-moments estimates.

mu1_true, mu2_true = 7.0, 4.0
data = rng.poisson(mu1_true, size=3_000) - rng.poisson(mu2_true, size=3_000)

mu1_mom, mu2_mom = skellam_mom_estimates(data)


def nll(log_params):
    mu1 = math.exp(log_params[0])
    mu2 = math.exp(log_params[1])
    return -stats.skellam.logpmf(data, mu1, mu2).sum()


x0 = np.log([max(mu1_mom, 1e-6), max(mu2_mom, 1e-6)])
res = optimize.minimize(nll, x0=x0, method="L-BFGS-B")

mu1_mle, mu2_mle = np.exp(res.x)

{
    "mu1_true": mu1_true,
    "mu2_true": mu2_true,
    "mu1_mom": mu1_mom,
    "mu2_mom": mu2_mom,
    "mu1_mle": float(mu1_mle),
    "mu2_mle": float(mu2_mle),
    "success": bool(res.success),
}
{'mu1_true': 7.0,
 'mu2_true': 4.0,
 'mu1_mom': 6.923159111111111,
 'mu2_mom': 3.904492444444444,
 'mu1_mle': 6.928769718257335,
 'mu2_mle': 3.9101030817961346,
 'success': True}

7) Sampling & Simulation#

The defining construction immediately gives a simple sampler:

  1. draw (Y_1 \sim \text{Pois}(\mu_1))

  2. draw (Y_2 \sim \text{Pois}(\mu_2)) independently

  3. return (X = Y_1 - Y_2)

This is NumPy-only because NumPy’s random generator includes a Poisson sampler (rng.poisson).

def sample_skellam_numpy(mu1, mu2, size, *, rng):
    mu1, mu2 = _validate_mu1_mu2(mu1, mu2)
    y1 = rng.poisson(lam=mu1, size=size)
    y2 = rng.poisson(lam=mu2, size=size)
    return y1 - y2
mu1, mu2 = 3.5, 2.0
x = sample_skellam_numpy(mu1, mu2, size=200_000, rng=rng)

{
    "sample_mean": float(x.mean()),
    "theory_mean": mu1 - mu2,
    "sample_var": float(x.var(ddof=0)),
    "theory_var": mu1 + mu2,
}
{'sample_mean': 1.49472,
 'theory_mean': 1.5,
 'sample_var': 5.4990721216,
 'theory_var': 5.5}

8) Visualization#

We’ll visualize:

  • the PMF over a high-probability range (a finite window covering most mass)

  • the CDF as a step function

  • Monte Carlo samples vs the exact PMF

mu1, mu2 = 9.0, 6.0

alpha = 0.001
k_min = int(stats.skellam.ppf(alpha, mu1, mu2))
k_max = int(stats.skellam.ppf(1 - alpha, mu1, mu2))
ks = np.arange(k_min, k_max + 1)

pmf = stats.skellam.pmf(ks, mu1, mu2)
cdf = stats.skellam.cdf(ks, mu1, mu2)

fig_pmf = go.Figure()
fig_pmf.add_trace(go.Bar(x=ks, y=pmf, name="PMF"))
fig_pmf.update_layout(
    title=f"Skellam PMF (mu1={mu1}, mu2={mu2})",
    xaxis_title="k",
    yaxis_title="P(X=k)",
)
fig_pmf.show()

fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=ks, y=cdf, mode="lines", line_shape="hv", name="CDF"))
fig_cdf.update_layout(
    title=f"Skellam CDF (mu1={mu1}, mu2={mu2})",
    xaxis_title="k",
    yaxis_title="P(X≤k)",
)
fig_cdf.show()

mc = sample_skellam_numpy(mu1, mu2, size=200_000, rng=rng)
mask = (mc >= k_min) & (mc <= k_max)
counts = np.bincount(mc[mask] - k_min, minlength=len(ks))
pmf_hat = counts / mc.size

fig_mc = go.Figure()
fig_mc.add_trace(go.Bar(x=ks, y=pmf_hat, name="Monte Carlo", opacity=0.6))
fig_mc.add_trace(go.Scatter(x=ks, y=pmf, mode="markers+lines", name="Exact PMF"))
fig_mc.update_layout(
    title=f"Monte Carlo vs exact PMF (mu1={mu1}, mu2={mu2})",
    xaxis_title="k",
    yaxis_title="Probability",
)
fig_mc.show()

{
    "prob_mass_in_plot_window": float(pmf.sum()),
    "mc_mass_in_plot_window": float(((mc >= k_min) & (mc <= k_max)).mean()),
}
{'prob_mass_in_plot_window': 0.9984984715902749,
 'mc_mass_in_plot_window': 0.998655}

9) SciPy Integration#

SciPy provides a numerically robust implementation via scipy.stats.skellam.

  • Use pmf, cdf, sf, rvs, logpmf, logcdf, …

  • Many rv_discrete objects (including skellam) do not expose a .fit() method. In SciPy 1.15+, you can use the generic scipy.stats.fit for MLE with bounds.

  • skellam also has a loc parameter (integer shift). In most modeling contexts you want loc=0.

skellam = stats.skellam

mu1, mu2 = 5.0, 3.0
alpha = 0.001
k_min = int(skellam.ppf(alpha, mu1, mu2))
k_max = int(skellam.ppf(1 - alpha, mu1, mu2))
ks = np.arange(k_min, k_max + 1)

pmf = skellam.pmf(ks, mu1, mu2)
cdf = skellam.cdf(ks, mu1, mu2)
samples = skellam.rvs(mu1, mu2, size=10_000, random_state=rng)

mom = skellam_moments(mu1, mu2)

# Fit with scipy.stats.fit (MLE)
data = skellam.rvs(7.0, 4.0, size=2_000, random_state=rng)
fit_res = stats.fit(
    skellam,
    data,
    bounds={"mu1": (0.0, 50.0), "mu2": (0.0, 50.0), "loc": (0.0, 0.0)},
)

{
    "pmf_sum_on_grid": float(pmf.sum()),
    "cdf_last_on_grid": float(cdf[-1]),
    "sample_mean": float(samples.mean()),
    "theory_mean": mom["mean"],
    "sample_var": float(samples.var(ddof=0)),
    "theory_var": mom["var"],
    "fit_success": bool(fit_res.success),
    "fit_mu1": float(fit_res.params.mu1),
    "fit_mu2": float(fit_res.params.mu2),
    "fit_loc": float(fit_res.params.loc),
}
{'pmf_sum_on_grid': 0.998847831553688,
 'cdf_last_on_grid': 0.9992059941388924,
 'sample_mean': 1.999,
 'theory_mean': 2.0,
 'sample_var': 7.929998999999999,
 'theory_var': 8.0,
 'fit_success': True,
 'fit_mu1': 7.284579179865933,
 'fit_mu2': 4.275579077706104,
 'fit_loc': 0.0}

10) Statistical Use Cases#

A) Hypothesis testing (difference of Poisson means)#

If you have a model for two independent Poisson counts, Skellam gives the exact distribution of the difference. This can be used for exact p-values on a score differential or “net count”.

B) Bayesian modeling (two Poisson rates)#

In many applications you model (Y_1) and (Y_2) directly with Poisson likelihoods and put Gamma priors on (\mu_1, \mu_2). The induced predictive distribution for (X=Y_1-Y_2) is then a (mixture of) Skellam.

C) Generative modeling (integer-valued innovations)#

Skellam is a convenient choice when you need integer noise that can be negative, e.g. in state-space models or random walks with drift.

# Example: test whether there is an advantage (H0: mu1 = mu2)
mu0 = 1.4
d_obs = 3

# Two-sided p-value under a symmetric Skellam(mu0, mu0)
p_upper = stats.skellam.sf(d_obs - 1, mu0, mu0)  # P(X >= d_obs)
p_lower = stats.skellam.cdf(-d_obs, mu0, mu0)  # P(X <= -d_obs)
p_two = p_upper + p_lower

# Conditional exact test given N = Y1 + Y2 (does not depend on mu0 under H0)
y1, y2 = 4, 1
n = y1 + y2
binom_res = stats.binomtest(y1, n=n, p=0.5, alternative="two-sided")

{
    "skellam_two_sided_p": float(p_two),
    "conditional_binomtest_p": float(binom_res.pvalue),
    "note": "Under H0 mu1=mu2, conditioning on N=Y1+Y2 gives a Binomial test.",
}
{'skellam_two_sided_p': 0.12687630672173225,
 'conditional_binomtest_p': 0.375,
 'note': 'Under H0 mu1=mu2, conditioning on N=Y1+Y2 gives a Binomial test.'}
# Simple Bayesian example (Gamma–Poisson) + posterior predictive for the score difference
y1_obs = np.array([2, 1, 0, 3, 1, 2])
y2_obs = np.array([1, 0, 1, 1, 2, 1])
n_games = len(y1_obs)

# Prior: mu ~ Gamma(alpha, beta) with beta = rate (so scale = 1/beta)
alpha1, beta1 = 1.0, 1.0
alpha2, beta2 = 1.0, 1.0

post_alpha1 = alpha1 + y1_obs.sum()
post_beta1 = beta1 + n_games
post_alpha2 = alpha2 + y2_obs.sum()
post_beta2 = beta2 + n_games

m = 50_000
mu1_s = rng.gamma(shape=post_alpha1, scale=1.0 / post_beta1, size=m)
mu2_s = rng.gamma(shape=post_alpha2, scale=1.0 / post_beta2, size=m)

# Posterior predictive for next-game difference (mixture of Skellam)
y1_pred = rng.poisson(mu1_s)
y2_pred = rng.poisson(mu2_s)
d_pred = y1_pred - y2_pred

k_lo = int(np.percentile(d_pred, 0.5))
k_hi = int(np.percentile(d_pred, 99.5))
ks = np.arange(k_lo, k_hi + 1)
mask = (d_pred >= k_lo) & (d_pred <= k_hi)
counts = np.bincount(d_pred[mask] - k_lo, minlength=len(ks))
pmf_hat = counts / d_pred.size

fig = go.Figure()
fig.add_trace(go.Bar(x=ks, y=pmf_hat, name="posterior predictive"))
fig.update_layout(
    title="Posterior predictive distribution of difference (Gamma–Poisson model)",
    xaxis_title="k",
    yaxis_title="Probability",
)
fig.show()

{
    "posterior_mean_mu1": float(mu1_s.mean()),
    "posterior_mean_mu2": float(mu2_s.mean()),
    "predictive_mean_diff": float(d_pred.mean()),
    "predictive_var_diff": float(d_pred.var(ddof=0)),
    "p(diff>0)": float((d_pred > 0).mean()),
}
{'posterior_mean_mu1': 1.4275728197733406,
 'posterior_mean_mu2': 0.9975202827806741,
 'predictive_mean_diff': 0.43624,
 'predictive_var_diff': 2.7678146623999997,
 'p(diff>0)': 0.46646}
# Generative modeling: Skellam innovations in an integer-valued random walk
T = 200
mu_plus, mu_minus = 2.0, 1.5

increments = sample_skellam_numpy(mu_plus, mu_minus, size=T, rng=rng)
path = np.cumsum(increments)

fig = go.Figure()
fig.add_trace(go.Scatter(y=path, mode="lines", name="x_t"))
fig.update_layout(
    title="Random walk with Skellam innovations",
    xaxis_title="t",
    yaxis_title="x_t",
)
fig.show()

{
    "drift_theory": mu_plus - mu_minus,
    "drift_empirical": float(increments.mean()),
}
{'drift_theory': 0.5, 'drift_empirical': 0.415}

11) Pitfalls#

  • Invalid parameters: (\mu_1, \mu_2) must be (\ge 0). The boundary case (\mu_1=\mu_2=0) is degenerate (all mass at 0).

  • Model misspecification: Skellam assumes independent Poisson counts. Correlation or over/under-dispersion can invalidate p-values and CIs. If overdispersion is present, consider modeling (Y_1) and (Y_2) with Negative Binomial (Gamma–Poisson mixture) or adding random effects.

  • Unequal exposure: if the two counts come from different time windows or populations, interpret (\mu_1) and (\mu_2) as expected counts for those exposures.

  • Numerical issues: the PMF involves a modified Bessel function (I_{|k|}(\cdot)), which grows roughly like (e^{2\sqrt{\mu_1\mu_2}}). Naive evaluation can overflow; prefer logpmf or scaled Bessel forms (as in special.ive).

mu1, mu2 = 200.0, 190.0
k = 0
z = 2.0 * math.sqrt(mu1 * mu2)

iv_val = special.iv(abs(k), z)
ive_val = special.ive(abs(k), z)

# Naive PMF computation can produce inf/0 -> nan for large parameters
pmf_naive = math.exp(-(mu1 + mu2)) * (mu1 / mu2) ** (k / 2) * iv_val

logpmf_stable = skellam_logpmf(k, mu1, mu2)
logpmf_scipy = stats.skellam.logpmf(k, mu1, mu2)

{
    "z": z,
    "iv(abs(k), z)": float(iv_val),
    "ive(abs(k), z)": float(ive_val),
    "pmf_naive": float(pmf_naive) if np.isfinite(pmf_naive) else str(pmf_naive),
    "logpmf_stable": float(logpmf_stable),
    "logpmf_scipy": float(logpmf_scipy),
}

k_far = 600
{
    "k_far": k_far,
    "pmf": float(stats.skellam.pmf(k_far, mu1, mu2)),
    "logpmf": float(stats.skellam.logpmf(k_far, mu1, mu2)),
}
{'k_far': 600, 'pmf': 2.127058179463747e-171, 'logpmf': -392.98731101331043}

12) Summary#

  • Skellam models the difference of two independent Poisson counts: (X=Y_1-Y_2).

  • Support is all integers; parameters (\mu_1,\mu_2\ge 0).

  • Key moments: (\mathbb{E}[X]=\mu_1-\mu_2), (\mathrm{Var}(X)=\mu_1+\mu_2), skewness ((\mu_1-\mu_2)/(\mu_1+\mu_2)^{3/2}).

  • PMF involves a modified Bessel function; use logpmf / scaled Bessel forms for numerical stability.

  • Sampling is easy: draw two Poisson counts and subtract.

  • Common uses include score differentials, net event counts, and integer-valued innovations in generative models.